Blog

, ,

Largest Armenian Language Data Collection Challenge Comes to Armenia

2 min read

YEREVAN, Armenia On December 1-4, the American University of Armenia (AUA) Zaven P. & Sonia Akian College of Science and Engineering (CSE), is collaborating with NVIDIA to organize the inaugural Armenian Language Data Collection Challenge. With its pivotal role in gathering language data in Armenian, the challenge promises to become a groundbreaking milestone in speech integration into the realm of human-machine interaction for propelling Armenian language technology into the future. 

During the 3-day challenge, the participants will engage in tasks by accessing Mozilla Common Voice, a publicly available voice dataset that was developed in collaboration with NVIDIA. The challenge involves recording small excerpts in Armenian and validating speeches published by other users. At the conclusion, participants will need to submit the total number of recorded and validated speech screenshots for review. Those registering the highest number of contributions will be awarded prizes by NVIDIA corporation. For more information about the challenge and to register, visit the webpage.

The event aims to attract participants from various Armenian schools and universities, as well as startups and tech companies active in the field. Dr. Habet Madoyan, program chair of Data Science at CSE, underscores the importance of this initiative: “The Armenian Language Data Collection Challenge marks a significant milestone in our journey toward enhancing language technologies. With only 5 hours of Armenian voice data compared to 3,400 of English and 154 of Georgian, this initiative is not just a step, but a huge leap forward. By contributing to Mozilla’s Common Voice, we are not just gathering data, but are democratizing access, allowing the general public to test models and innovate. The collaboration with NVIDIA adds a layer of technological prowess, ensuring that the data we collect serves as a bedrock for future advancements in AI and machine learning. This project underscores our mission to ensure that the Armenian language keeps pace in the digital era. It’s not just about preserving our language; it’s about actively ensuring it plays a significant role in technology and innovation.”  

“NVIDIA supports the community of local IT specialists in Yerevan as they work together to contribute to global AI technologies,” said Nikolay Karpov, senior research scientist at NVIDIA. “This project is meant to bring together the Armenian community and encourage people to consider donating their voice for the future release of an open-source Armenian speech recognition model.”

Thus, with this much-anticipated event scheduled for this week, the Armenian Language Data Collection Challenge will pave the way for future innovations in both the local and global landscape of language technology. 

Founded in 1991, the American University of Armenia (AUA) is a private, independent university located in Yerevan, Armenia, affiliated with the University of California, and accredited by the WASC Senior College and University Commission in the United States. AUA provides local and international students with Western-style education through top-quality undergraduate and graduate degree and certificate programs, promotes research and innovation, encourages civic engagement and community service, and fosters democratic values.

Media Coverage:

[Hetq] Հայաստանի ամերիկյան համալսարանն ու NVIDIA ընկերությունը հայերենի ձայնային բազայի հարստացման մրցույթ են հայտարարել